Tagging Complex NEs with MaxEnt Models: Layered Structures Versus Extended Tagset
نویسندگان
چکیده
The paper discusses two policies for recognizing NEs with complex structures by maximum entropy models. One policy is to develop cascaded MaxEnt models at different levels. The other is to design more detailed tags with human knowledge in order to represent complex structures. The experiments on Chinese organization names recognition indicate that layered structures result in more accurate models while extended tags can not lead to positive results as expected. We empirically prove that the {start, continue, end, unique, other} tag set is the best tag set for NE recognition with MaxEnt models.
منابع مشابه
High Accuracy Tagging with Large Tagsets
The paper presents experiments and results related to morpho-syntactic (MS) tagging of a highly inflectional language, based on combining language models (LM) learnt from multiple register-diversified corpora. To cope with a large tagset (614 tags), our underlying tagger uses a hidden smaller tagset (92 tags), mapped back, after the proper tagging, into the initial tagset. The same text is tagg...
متن کاملPart-of-Speech Tagging of Portuguese Using Hidden Markov Models with Character Language Model Emissions
This paper presents a probabilistic approach for POS tagging that combines HMMs and character language models being applied to Portuguese texts. In this approach, the emission probabilities for each hidden state in a HMM are estimated by a proper character language model. The tagger built has been trained and tested on Bosque, a subset of Floresta Sintá(c)tica treebank, reaching 96.2% accuracy ...
متن کاملTagset Design and Inflected Languages
An experiment designed to explore the relationship between tagging accuracy and the nature of the tagset is described, using corpora in English, French and Swedish. In particular, the question of internal versus external criteria for tagset design is considered, with the general conclusion that external (linguistic) criteria should be followed. Some problems associated with tagging unknown word...
متن کاملTiered Tagging and Combined Language Models Classifiers
We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly innectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT compliant 5]. The large tagset is internally mapped onto a reduced one (82 tags), serving statistical disambiguation, and a text disambiguated in terms of this tagset is subsequently subject to a recovery process of all the i...
متن کاملDeveloping a tagset for automated part-of-speech tagging in Urdu
1. Abstract While part-of-speech tagging is an established technology for Western European languages such as English or Spanish, extending the technique to Urdu presents a range of interesting issues. There are some problems associated with the writing system, e.g. the problems of locating token boundaries in the Urdu version of the Arabic script. However, there are also linguistic issues. Litt...
متن کامل